A Closest Fit Approach to Missing Attribute VAlues in Preterm Birth Data
نویسندگان
چکیده
Recently, results on a comparison of seven successful methods of handling missing attribute values were reported. This paper describes experimental results on the three most successful methods out of these seven. Two of these methods, based on a Closet Fit idea (searching in a remaining data set for the closest fit case and replacing a missing attribute value by the corresponding known value from that case) were enhanced by 12 strategies (using two different options for missing attribute replacement, three different interpretations of missing attribute values and two types of rules: certain and possible). Our results show that for a given data set the best method handling missing attribute values should be selected individually, testing the main two methods: Local Closest Fit method with four options: both types of missing attribute replacement and two interpretations of missing attribute values and the Most Common Value for symbolic attributes and Average Value for numerical attributes, both restricted to a concept. All of these methods are local, i.e., restricted to a concept, so it indicates all over again that local approaches are better than global ones.
منابع مشابه
Handling Missing Attribute Values in Preterm Birth Data Sets
The objective of our research was to find the best approach to handle missing attribute values in data sets describing preterm birth provided by the Duke University. Five strategies were used for filling in missing attribute values, based on most common values and closest fit for symbolic attributes, averages for numerical attributes, and a special approach to induce only certain rules from spe...
متن کاملA comparison of traditional and rough set approaches to missing attribute values in data mining
Real-life data sets are often incomplete, i.e., some attribute values are missing. In this paper we compare traditional, frequently used methods of handling missing attribute values, which are based on preprocessing, with another class of methods dealing with missing attribute values in which rule induction is performed directly on incomplete data sets, i.e., handling missing attribute values a...
متن کاملComparisons on Different Approaches to Assign Missing Attribute Values
A commonly-used and naive solution to process data with missing attribute values is to ignore the instances which contain missing attribute values. This method may neglect important information within the data, significant amount of data could be easily discarded, and the discovered knowledge may not contain significant rules. Some methods, such as assigning the most common values or assigning ...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملAn Approach to Imbalanced Data Sets Based on Changing Rule Strength
This paper describes experiments with a challenging data set describing preterm births. The data set, collected at the Duke University Medical Center, was large and, at the same time, many attribute values were missing. However, the main problem was that only 20.7% of the total number of cases represented the important preterm birth class. Thus the data set was imbalanced. For comparison, we in...
متن کامل